A CASA-labelling model using the localisation cue for robust cocktail-party speech recognition

نویسندگان

  • Hervé Glotin
  • Frédéric Berthommier
  • Emmanuel Tessier
چکیده

We propose a new cocktail-party recognition technique based on the coupling of a CASA-labelling method using the TDOA (Time Delay Of Arrival) with multistream recognition. This is an alternative to the classical "segregate and recognise" architecture. First, we have recorded a stereo database ST-NB95 from the mono Numbers95. This is composed of binary mixtures of sentences at 0dB, placed left and right. The probability to get the labels "left" and "right" is assigned to the subband time frames thanks to a mapping function. This depends on the relative level. It is established a priori, using a reference database composed of isolated words recorded in the same condition. We adapt the recognition paradigm to this particular situation. The model WER of binary mixtures is about 50%. This is a great improvement relatively to the WER (73%) of the fullband PLP. We conclude the model is able to recognise the dominant words of a binary mixture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Casa Front-end Using the Localisation Cue for Segregation and Then Cocktail-party Speech Recognition

We propose and test a cocktail-party recognition technique based on segregation applied before recognition. This CASA front-end uses the TDOA (Time Delay Of Arrival) evaluated within subbands in order to determine the Relative Level (RL) of two competing speech sources. To perform the evaluation of the model, we have recorded a stereo database ST-NB95 from the mono Numbers95. This is composed o...

متن کامل

Evaluation of CASA and BSS models for cocktail-party speech segregation

For speech segregation, a blind separation model (BSS) is tested together with a CASA model which is based on the localisation cue and the evaluation of the time delay of arrival (TDOA). The test database is composed of 332 binary mixture sentences recorded in stereo with a static set-up. These are truncated at 1 second for the simulations. For applying the two models, we cut the frequency doma...

متن کامل

Evaluation of CASA and BSS models for subband cocktail-party speech separation

For speech segregation, a recurrent blind separation model (BSS) is tested together with a CASA model, which is based on the localisation cue and the evaluation of the time delay of arrival (TDOA). The test database is composed of 332 binary mixture sentences recorded in stereo with a static set-up. These are truncated at 1 second for the simulations. For applying the two models, we cut the fre...

متن کامل

Comparative evaluation of CASA and BSS models for subband cocktail-party speech separation

For speech segregation, a blind separation model (BSS) is tested together with a CASA model which is based on the localisation cue and the evaluation of the time delay of arrival (TDOA). The test database is composed of 332 binary mixture sentences recorded in stereo with a static set-up. These are truncated at 1 second for the simulations. For applying the two models, we cut the frequency doma...

متن کامل

Comparative evaluation of CA for subband cocktail-party

For speech segregation, a recurrent blind separation model (BSS) is tested together with a Computational Auditory Scene Analysis (CASA) model, which is based on the localisation cue and the evaluation of the Time Delay Of Arrival (TDOA). The test database is composed of 332 binary mixture sentences recorded in stereo with a static set-up. These are truncated at 1 second for the simulations. For...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999